GWASLab
  • Home
  • Tutorial
  • Sumstats manipulation
    • SumstatsObject
    • StatusCode
    • Standardization&normalization
    • QC&Filtering
    • Harmonization
    • Liftover
    • Data conversion
  • Visualization
    • Manhattan & QQ plot
    • Regional Plot
    • Brisbane Plot
    • Miami Plot
    • Scatter: effect size comparison
    • Scatter: allele frequency comparison
    • Heatmap: genetic correlation
    • Forest Plot
    • Gallery
  • Utilities
    • Extract Lead Variants
    • Extract Novel Variants
    • Format & Save
    • Load LDSC log
    • Convert Heritability
    • Infer Genome Build
  • References
    • Reference data
    • Common data
    • Update logs
  • Examples
    • Standardization and QC example
      • Load sample data
      • QC & Filtering
    • Harmonization example
      • Workflow
      • Liftover
    • Formatting and saving
    • Utility example
      • Data conversion
      • Get novel variants
    • Visualization example
      • Manhattan and Q-Q plot
      • Miami plot
      • Regional plot
      • Brisbane plot
  • Previous
  • Next
  • GitHub
  • Regional plot

Regional plot¶

In [1]:
Copied!
import sys
sys.path.insert(0,"/home/yunye/gwaslab/gwaslab/src")
import gwaslab as gl
import sys sys.path.insert(0,"/home/yunye/gwaslab/gwaslab/src") import gwaslab as gl
In [2]:
Copied!
!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
--2022-10-20 22:22:11--  http://jenger.riken.jp/14/
Resolving jenger.riken.jp (jenger.riken.jp)... 134.160.84.25
Connecting to jenger.riken.jp (jenger.riken.jp)|134.160.84.25|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 274187574 (261M) [text/plain]
Saving to: ‘t2d_bbj.txt.gz’

t2d_bbj.txt.gz      100%[===================>] 261.49M  10.6MB/s    in 25s     

2022-10-20 22:22:36 (10.6 MB/s) - ‘t2d_bbj.txt.gz’ saved [274187574/274187574]

In [2]:
Copied!
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
             snpid="SNP",
             chrom="CHR",
             pos="POS",
             ea="ALT",
             nea="REF",
             neaf="Frq",
             beta="BETA",
             se="SE",
             p="P",
             direction="Dir",
             n="N")
mysumstats = gl.Sumstats("t2d_bbj.txt.gz", snpid="SNP", chrom="CHR", pos="POS", ea="ALT", nea="REF", neaf="Frq", beta="BETA", se="SE", p="P", direction="Dir", n="N")
Fri Oct 21 01:15:35 2022 Start to initiate from file :t2d_bbj.txt.gz
Fri Oct 21 01:15:52 2022  -Reading columns          : ALT,CHR,SNP,N,Frq,Dir,REF,BETA,POS,SE,P
Fri Oct 21 01:15:52 2022  -Renaming columns to      : EA,CHR,SNPID,N,EAF,DIRECTION,NEA,BETA,POS,SE,P
Fri Oct 21 01:15:52 2022  -Current dataframe shape  : Rows  12557761  x  11  Columns
Fri Oct 21 01:15:53 2022  -Initiating a status column ...
Fri Oct 21 01:15:54 2022  -Reordering columns to    : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS
Fri Oct 21 01:15:55 2022  -NEAF is specified...
Fri Oct 21 01:15:55 2022  -Checking if 0<= NEAF <=1 ...
Fri Oct 21 01:15:57 2022  -Converted NEAF to EAF.
Fri Oct 21 01:15:57 2022  -Removed 0 variants with bad NEAF.
Fri Oct 21 01:15:57 2022 Finished loading data successfully!
In [4]:
Copied!
mysumstats.plot_mqq(skip=3,cut=20, mode="mqq",stratified=True)
mysumstats.plot_mqq(skip=3,cut=20, mode="mqq",stratified=True)
Thu Oct 20 21:13:27 2022 Start to plot manhattan/qq plot with the following basic settings:
Thu Oct 20 21:13:27 2022  -Genome-wide significance level is set to 5e-08 ...
Thu Oct 20 21:13:27 2022  -Raw input contains 12557761 variants...
Thu Oct 20 21:13:27 2022  -Plot layout mode is : mqq
Thu Oct 20 21:13:36 2022 Finished loading specified columns from the sumstats.
Thu Oct 20 21:13:36 2022 Start conversion and sanity check:
Thu Oct 20 21:13:38 2022  -Removed 328791 variants with nan in CHR or POS column ...
Thu Oct 20 21:13:38 2022  -Removed 0 variants with nan in EAF column ...
Thu Oct 20 21:13:39 2022  -Removed 0 variants with nan in P column ...
Thu Oct 20 21:13:39 2022  -P values are being converted to -log10(P)...
Thu Oct 20 21:13:39 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Thu Oct 20 21:13:41 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Thu Oct 20 21:13:43 2022  -Maximum -log10(P) values is 167.58838029403677 .
Thu Oct 20 21:13:43 2022  -Minus log10(P) values above 20 will be shrunk with a shrinkage factor of 10...
Thu Oct 20 21:13:43 2022 Finished data conversion and sanity check.
Thu Oct 20 21:13:43 2022 Start to create manhattan plot with 87543 variants:
Thu Oct 20 21:13:44 2022  -Found 84 significant variants with a sliding window size of 500 kb...
Thu Oct 20 21:13:44 2022 Finished creating Manhattan plot successfully!
Thu Oct 20 21:13:44 2022  -Skip annotating
Thu Oct 20 21:13:44 2022 Start to create QQ plot with 87543 variants:
Thu Oct 20 21:13:45 2022  -Calculating GC using P : 1.2139048028292598
Thu Oct 20 21:13:45 2022 Finished creating QQ plot successfully!
Out[4]:
(<Figure size 1500x500 with 2 Axes>, <gwaslab.Log.Log at 0x7ff2b0428e20>)
In [3]:
Copied!
mysumstats.filter_in(eq={"CHR":"7"})
mysumstats.basic_check()
mysumstats.filter_in(eq={"CHR":"7"}) mysumstats.basic_check()
Fri Oct 21 01:15:57 2022 Start filtering values:
Fri Oct 21 01:15:58 2022  -Keeping 707780 variants with CHR = 7 ...
Fri Oct 21 01:15:58 2022 Finished filtering values.
Fri Oct 21 01:15:59 2022 Start to check IDs...
Fri Oct 21 01:15:59 2022  -Current Dataframe shape : 707780  x  12
Fri Oct 21 01:15:59 2022  -Checking if SNPID is chr:pos:ref:alt...(separator: - ,: , _)
Fri Oct 21 01:16:00 2022 Finished checking IDs successfully!
Fri Oct 21 01:16:00 2022 Start to fix chromosome notation...
Fri Oct 21 01:16:00 2022  -Current Dataframe shape : 707780  x  12
Fri Oct 21 01:16:01 2022  -All CHR are already fixed...
Fri Oct 21 01:16:02 2022 Finished fixing chromosome notation successfully!
Fri Oct 21 01:16:02 2022 Start to fix basepair positions...
Fri Oct 21 01:16:02 2022  -Current Dataframe shape : 707780  x  12
Fri Oct 21 01:16:03 2022  -Position upper_bound is: 250,000,000
Fri Oct 21 01:16:04 2022  -Remove outliers: 0
Fri Oct 21 01:16:04 2022  -Converted all position to datatype Int64.
Fri Oct 21 01:16:04 2022 Finished fixing basepair position successfully!
Fri Oct 21 01:16:04 2022 Start to fix alleles...
Fri Oct 21 01:16:04 2022  -Current Dataframe shape : 707780  x  12
Fri Oct 21 01:16:05 2022  -Converted all bases to string datatype and UPPERCASE.
Fri Oct 21 01:16:05 2022 Finished fixing allele successfully!
Fri Oct 21 01:16:05 2022 Start sanity check for statistics ...
Fri Oct 21 01:16:05 2022  -Current Dataframe shape : 707780  x  12
Fri Oct 21 01:16:05 2022  -Checking if N is Int64 and N>0 ...
Fri Oct 21 01:16:05 2022  -Removed 0 variants with bad N.
Fri Oct 21 01:16:05 2022  -Checking if 0<= EAF <=1 ...
Fri Oct 21 01:16:06 2022  -Removed 0 variants with bad EAF.
Fri Oct 21 01:16:06 2022  -Checking if MAC >=5 ...
Fri Oct 21 01:16:06 2022  -Removed 0 variants with bad MAC.
Fri Oct 21 01:16:06 2022  -Checking if 0< P <5e-300 ...
Fri Oct 21 01:16:06 2022  -Removed 0 variants with bad P.
Fri Oct 21 01:16:06 2022  -Checking if abs(BETA)<10 ...
Fri Oct 21 01:16:06 2022  -Removed 0 variants with bad BETA.
Fri Oct 21 01:16:06 2022  -Checking if SE >0 ...
Fri Oct 21 01:16:06 2022  -Removed 0 variants with bad SE.
Fri Oct 21 01:16:06 2022  -Checking STATUS...
Fri Oct 21 01:16:06 2022  -Coverting STAUTUS to category.
Fri Oct 21 01:16:06 2022  -Removed 0 variants with bad statistics in total.
Fri Oct 21 01:16:06 2022 Finished sanity check successfully!
Fri Oct 21 01:16:06 2022 Start to normalize variants...
Fri Oct 21 01:16:06 2022  -Current Dataframe shape : 707780  x  12
Fri Oct 21 01:16:07 2022  -No available variants to normalize..
Fri Oct 21 01:16:07 2022 Finished normalizing variants successfully!
In [7]:
Copied!
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl",build="19")
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl",build="19")
Thu Oct 20 21:14:02 2022 Start to plot manhattan/qq plot with the following basic settings:
Thu Oct 20 21:14:02 2022  -Genome-wide significance level is set to 5e-08 ...
Thu Oct 20 21:14:02 2022  -Raw input contains 707780 variants...
Thu Oct 20 21:14:02 2022  -Plot layout mode is : m
Thu Oct 20 21:14:02 2022 Finished loading specified columns from the sumstats.
Thu Oct 20 21:14:02 2022 Start conversion and sanity check:
Thu Oct 20 21:14:02 2022  -Removed 0 variants with nan in CHR or POS column ...
Thu Oct 20 21:14:02 2022  -Removed 0 variants with nan in P column ...
Thu Oct 20 21:14:02 2022  -P values are being converted to -log10(P)...
Thu Oct 20 21:14:02 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Thu Oct 20 21:14:02 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Thu Oct 20 21:14:02 2022  -Maximum -log10(P) values is 73.38711023071251 .
Thu Oct 20 21:14:02 2022 Finished data conversion and sanity check.
Thu Oct 20 21:14:03 2022 Start to create manhattan plot with 707741 variants:
INFO:pyensembl.database:Creating database: /home/yunye/gwaslab/gwaslab/src/gwaslab/data/Ensembl/release75/Homo_sapiens.GRCh37.75.protein_coding.gtf.db
INFO:pyensembl.database:Reading GTF from /home/yunye/gwaslab/gwaslab/src/gwaslab/data/Ensembl/release75/Homo_sapiens.GRCh37.75.protein_coding.gtf.gz
Thu Oct 20 21:14:04 2022  -Found 7 significant variants with a sliding window size of 500 kb...
Thu Oct 20 21:14:04 2022 Start to annotate variants with nearest gene name(s)...
Thu Oct 20 21:14:04 2022  -Assigning Gene name using built-in Ensembl Release 75  (hg19)
/home/yunye/anaconda3/lib/python3.9/site-packages/gtfparse/read_gtf.py:82: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.


  chunk_iterator = pd.read_csv(
/home/yunye/anaconda3/lib/python3.9/site-packages/gtfparse/read_gtf.py:82: FutureWarning: The warn_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.


  chunk_iterator = pd.read_csv(
INFO:root:Extracted GTF attributes: ['gene_id', 'transcript_id', 'gene_name', 'gene_biotype', 'transcript_name', 'exon_number', 'exon_id', 'ccds_id', 'protein_id']
INFO:root:Using column 'source' to replace missing 'transcript_biotype'
INFO:datacache.database_helpers:Creating database /home/yunye/gwaslab/gwaslab/src/gwaslab/data/Ensembl/release75/Homo_sapiens.GRCh37.75.protein_coding.gtf.db containing: stop_codon, start_codon, transcript, CDS, gene, exon
INFO:datacache.database:Running sqlite query: "CREATE TABLE stop_codon (transcript_name TEXT NOT NULL, end INT NOT NULL, exon_id TEXT NOT NULL, seqname TEXT NOT NULL, feature TEXT NOT NULL, source TEXT NOT NULL, ccds_id TEXT NOT NULL, transcript_id TEXT NOT NULL, start INT NOT NULL, strand TEXT NOT NULL, exon_number TEXT NOT NULL, gene_id TEXT NOT NULL, transcript_biotype TEXT NOT NULL, gene_name TEXT NOT NULL, gene_biotype TEXT NOT NULL, protein_id TEXT NOT NULL)"
INFO:datacache.database:Inserting 73352 rows into table stop_codon
INFO:datacache.database:Creating index on stop_codon (seqname, start, end)
INFO:datacache.database:Creating index on stop_codon (gene_name)
INFO:datacache.database:Creating index on stop_codon (gene_id)
INFO:datacache.database:Creating index on stop_codon (transcript_id)
INFO:datacache.database:Creating index on stop_codon (transcript_name)
INFO:datacache.database:Creating index on stop_codon (exon_id)
INFO:datacache.database:Creating index on stop_codon (protein_id)
INFO:datacache.database:Creating index on stop_codon (ccds_id)
INFO:datacache.database:Running sqlite query: "CREATE TABLE start_codon (transcript_name TEXT NOT NULL, end INT NOT NULL, exon_id TEXT NOT NULL, seqname TEXT NOT NULL, feature TEXT NOT NULL, source TEXT NOT NULL, ccds_id TEXT NOT NULL, transcript_id TEXT NOT NULL, start INT NOT NULL, strand TEXT NOT NULL, exon_number TEXT NOT NULL, gene_id TEXT NOT NULL, transcript_biotype TEXT NOT NULL, gene_name TEXT NOT NULL, gene_biotype TEXT NOT NULL, protein_id TEXT NOT NULL)"
INFO:datacache.database:Inserting 73293 rows into table start_codon
INFO:datacache.database:Creating index on start_codon (seqname, start, end)
INFO:datacache.database:Creating index on start_codon (gene_name)
INFO:datacache.database:Creating index on start_codon (gene_id)
INFO:datacache.database:Creating index on start_codon (transcript_id)
INFO:datacache.database:Creating index on start_codon (transcript_name)
INFO:datacache.database:Creating index on start_codon (exon_id)
INFO:datacache.database:Creating index on start_codon (protein_id)
INFO:datacache.database:Creating index on start_codon (ccds_id)
INFO:datacache.database:Running sqlite query: "CREATE TABLE transcript (transcript_name TEXT NOT NULL, end INT NOT NULL, exon_id TEXT NOT NULL, seqname TEXT NOT NULL, feature TEXT NOT NULL, source TEXT NOT NULL, ccds_id TEXT NOT NULL, transcript_id TEXT UNIQUE PRIMARY KEY NOT NULL, start INT NOT NULL, strand TEXT NOT NULL, exon_number TEXT NOT NULL, gene_id TEXT NOT NULL, transcript_biotype TEXT NOT NULL, gene_name TEXT NOT NULL, gene_biotype TEXT NOT NULL, protein_id TEXT NOT NULL)"
INFO:datacache.database:Inserting 165581 rows into table transcript
INFO:datacache.database:Creating index on transcript (seqname, start, end)
INFO:datacache.database:Creating index on transcript (gene_name)
INFO:datacache.database:Creating index on transcript (gene_id)
INFO:datacache.database:Creating index on transcript (transcript_name)
INFO:datacache.database:Creating index on transcript (exon_id)
INFO:datacache.database:Creating index on transcript (protein_id)
INFO:datacache.database:Creating index on transcript (ccds_id)
INFO:datacache.database:Running sqlite query: "CREATE TABLE CDS (transcript_name TEXT NOT NULL, end INT NOT NULL, exon_id TEXT NOT NULL, seqname TEXT NOT NULL, feature TEXT NOT NULL, source TEXT NOT NULL, ccds_id TEXT NOT NULL, transcript_id TEXT NOT NULL, start INT NOT NULL, strand TEXT NOT NULL, exon_number TEXT NOT NULL, gene_id TEXT NOT NULL, transcript_biotype TEXT NOT NULL, gene_name TEXT NOT NULL, gene_biotype TEXT NOT NULL, protein_id TEXT NOT NULL)"
INFO:datacache.database:Inserting 791057 rows into table CDS
INFO:datacache.database:Creating index on CDS (seqname, start, end)
INFO:datacache.database:Creating index on CDS (gene_name)
INFO:datacache.database:Creating index on CDS (gene_id)
INFO:datacache.database:Creating index on CDS (transcript_id)
INFO:datacache.database:Creating index on CDS (transcript_name)
INFO:datacache.database:Creating index on CDS (exon_id)
INFO:datacache.database:Creating index on CDS (protein_id)
INFO:datacache.database:Creating index on CDS (ccds_id)
INFO:datacache.database:Running sqlite query: "CREATE TABLE gene (transcript_name TEXT NOT NULL, end INT NOT NULL, exon_id TEXT NOT NULL, seqname TEXT NOT NULL, feature TEXT NOT NULL, source TEXT NOT NULL, ccds_id TEXT NOT NULL, transcript_id TEXT NOT NULL, start INT NOT NULL, strand TEXT NOT NULL, exon_number TEXT NOT NULL, gene_id TEXT UNIQUE PRIMARY KEY NOT NULL, transcript_biotype TEXT NOT NULL, gene_name TEXT NOT NULL, gene_biotype TEXT NOT NULL, protein_id TEXT NOT NULL)"
INFO:datacache.database:Inserting 24159 rows into table gene
INFO:datacache.database:Creating index on gene (seqname, start, end)
INFO:datacache.database:Creating index on gene (gene_name)
INFO:datacache.database:Creating index on gene (transcript_id)
INFO:datacache.database:Creating index on gene (transcript_name)
INFO:datacache.database:Creating index on gene (exon_id)
INFO:datacache.database:Creating index on gene (protein_id)
INFO:datacache.database:Creating index on gene (ccds_id)
INFO:datacache.database:Running sqlite query: "CREATE TABLE exon (transcript_name TEXT NOT NULL, end INT NOT NULL, exon_id TEXT NOT NULL, seqname TEXT NOT NULL, feature TEXT NOT NULL, source TEXT NOT NULL, ccds_id TEXT NOT NULL, transcript_id TEXT NOT NULL, start INT NOT NULL, strand TEXT NOT NULL, exon_number TEXT NOT NULL, gene_id TEXT NOT NULL, transcript_biotype TEXT NOT NULL, gene_name TEXT NOT NULL, gene_biotype TEXT NOT NULL, protein_id TEXT NOT NULL)"
INFO:datacache.database:Inserting 1195297 rows into table exon
INFO:datacache.database:Creating index on exon (seqname, start, end)
INFO:datacache.database:Creating index on exon (gene_name)
INFO:datacache.database:Creating index on exon (gene_id)
INFO:datacache.database:Creating index on exon (transcript_id)
INFO:datacache.database:Creating index on exon (transcript_name)
INFO:datacache.database:Creating index on exon (exon_id)
INFO:datacache.database:Creating index on exon (protein_id)
INFO:datacache.database:Creating index on exon (ccds_id)
INFO:datacache.database:Running sqlite query: "CREATE TABLE _datacache_metadata (version INT)"
INFO:datacache.database:Running sqlite query: "INSERT INTO _datacache_metadata VALUES (3)"
Thu Oct 20 21:15:04 2022 Finished annotating variants with nearest gene name(s) successfully!
Thu Oct 20 21:15:04 2022 Finished creating Manhattan plot successfully!
Thu Oct 20 21:15:04 2022  -Annotating using column GENENAME...
Out[7]:
(<Figure size 1500x500 with 1 Axes>, <gwaslab.Log.Log at 0x7ff2b0428e20>)
In [8]:
Copied!
mysumstats.get_lead()
mysumstats.get_lead()
Thu Oct 20 21:15:29 2022 Start to extract lead variants...
Thu Oct 20 21:15:29 2022  -Processing 707780 variants...
Thu Oct 20 21:15:29 2022  -Significance threshold : 5e-08
Thu Oct 20 21:15:29 2022  -Sliding window size: 500  kb
Thu Oct 20 21:15:29 2022  -Found 1077 significant variants in total...
Thu Oct 20 21:15:30 2022  -Identified 7 lead variants!
Thu Oct 20 21:15:30 2022 Finished extracting lead variants successfully!
Out[8]:
SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS
5474489 7:13888699_G_C 7 13888699 G C 0.5680 0.0562 0.0094 2.507000e-09 191764 ++++ 9960099
5480067 7:14898282_C_T 7 14898282 C T 0.6012 0.0617 0.0088 2.336000e-12 191764 ++++ 9960099
5626346 7:44174857_T_G 7 44174857 G T 0.5985 -0.0640 0.0093 5.325000e-12 191764 ---- 9960099
5732279 7:69406661_A_T 7 69406661 T A 0.1981 -0.0900 0.0111 4.871000e-16 191764 ---- 9960099
5965364 7:127253550_C_T 7 127253550 C T 0.9081 0.2761 0.0152 4.101000e-74 191764 ++++ 9960099
5976830 7:130025713_G_A 7 130025713 G A 0.9530 -0.1365 0.0230 3.068000e-09 191764 ---- 9960099
6092347 7:157038803_A_G 7 157038803 G A 0.4626 -0.0502 0.0088 1.127000e-08 191764 ---- 9960099
In [4]:
Copied!
mysumstats.get_lead(anno=True)
mysumstats.get_lead(anno=True)
Thu Oct 20 22:41:42 2022 Start to extract lead variants...
Thu Oct 20 22:41:42 2022  -Processing 707780 variants...
Thu Oct 20 22:41:42 2022  -Significance threshold : 5e-08
Thu Oct 20 22:41:42 2022  -Sliding window size: 500  kb
Thu Oct 20 22:41:42 2022  -Found 1077 significant variants in total...
Thu Oct 20 22:41:42 2022  -Identified 7 lead variants!
Thu Oct 20 22:41:42 2022 Start to annotate variants with nearest gene name(s)...
Thu Oct 20 22:41:42 2022  -Assigning Gene name using built-in Ensembl Release 75  (hg19)
Thu Oct 20 22:41:44 2022 Finished annotating variants with nearest gene name(s) successfully!
Thu Oct 20 22:41:44 2022 Finished extracting lead variants successfully!
Out[4]:
SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS LOCATION GENE
5474489 7:13888699_G_C 7 13888699 G C 0.5680 0.0562 0.0094 2.507000e-09 191764 ++++ 9960099 42154 ETV1
5480067 7:14898282_C_T 7 14898282 C T 0.6012 0.0617 0.0088 2.336000e-12 191764 ++++ 9960099 0 DGKB
5626346 7:44174857_T_G 7 44174857 G T 0.5985 -0.0640 0.0093 5.325000e-12 191764 ---- 9960099 3606 MYL7
5732279 7:69406661_A_T 7 69406661 T A 0.1981 -0.0900 0.0111 4.871000e-16 191764 ---- 9960099 0 AUTS2
5965364 7:127253550_C_T 7 127253550 C T 0.9081 0.2761 0.0152 4.101000e-74 191764 ++++ 9960099 0 PAX4
5976830 7:130025713_G_A 7 130025713 G A 0.9530 -0.1365 0.0230 3.068000e-09 191764 ---- 9960099 0 CPA1
6092347 7:157038803_A_G 7 157038803 G A 0.4626 -0.0502 0.0088 1.127000e-08 191764 ---- 9960099 0 UBE3C
In [7]:
Copied!
mysumstats.plot_mqq(region=(7,156538803,157538803))
mysumstats.plot_mqq(region=(7,156538803,157538803))
Wed Sep 21 14:47:58 2022 Start to plot manhattan/qq plot with the following basic settings:
Wed Sep 21 14:47:58 2022  -Genome-wide significance level is set to 5e-08 ...
Wed Sep 21 14:47:58 2022  -Raw input contains 707780 variants...
Wed Sep 21 14:47:58 2022  -Plot layout mode is : mqq
Wed Sep 21 14:47:58 2022  -Region to plot : chr7:156538803-157538803.
Wed Sep 21 14:47:59 2022  -Extract SNPs in region : chr7:156538803-157538803...
Wed Sep 21 14:48:01 2022  -Extract SNPs in specified regions: 5831
Wed Sep 21 14:48:01 2022 Finished loading specified columns from the sumstats.
Wed Sep 21 14:48:01 2022 Start conversion and sanity check:
Wed Sep 21 14:48:01 2022  -Removed 0 variants with nan in CHR or POS column ...
Wed Sep 21 14:48:01 2022  -Removed 0 variants with nan in P column ...
Wed Sep 21 14:48:01 2022  -P values are being converted to -log10(P)...
Wed Sep 21 14:48:01 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Wed Sep 21 14:48:01 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Wed Sep 21 14:48:01 2022  -Maximum -log10(P) values is 7.948076083953893 .
Wed Sep 21 14:48:01 2022 Finished data conversion and sanity check.
Wed Sep 21 14:48:01 2022 Start to create manhattan plot with 5831 variants:
Wed Sep 21 14:48:02 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Wed Sep 21 14:48:02 2022  -Skip annotating
Wed Sep 21 14:48:02 2022 Finished creating Manhattan plot successfully!
Wed Sep 21 14:48:02 2022 Start to create QQ plot with 5831 variants:
Wed Sep 21 14:48:02 2022  -Calculating GC using P : 1.7191388184129959
Wed Sep 21 14:48:02 2022 Finished creating QQ plot successfully!
Out[7]:
(<Figure size 1500x500 with 3 Axes>, <gwaslab.Log.Log at 0x7fc6dd3f0220>)
In [10]:
Copied!
mysumstats.plot_mqq(mode="r", region=(7,156538803,157538803),region_grid=True, gtf_path="ensembl")
mysumstats.plot_mqq(mode="r", region=(7,156538803,157538803),region_grid=True, gtf_path="ensembl")
Thu Oct 20 21:16:14 2022 Start to plot manhattan/qq plot with the following basic settings:
Thu Oct 20 21:16:14 2022  -Genome-wide significance level is set to 5e-08 ...
Thu Oct 20 21:16:14 2022  -Raw input contains 707780 variants...
Thu Oct 20 21:16:14 2022  -Plot layout mode is : r
Thu Oct 20 21:16:14 2022  -Region to plot : chr7:156538803-157538803.
Thu Oct 20 21:16:14 2022  -Extract SNPs in region : chr7:156538803-157538803...
Thu Oct 20 21:16:16 2022  -Extract SNPs in specified regions: 5831
Thu Oct 20 21:16:16 2022 Finished loading specified columns from the sumstats.
Thu Oct 20 21:16:16 2022 Start conversion and sanity check:
Thu Oct 20 21:16:16 2022  -Removed 0 variants with nan in P column ...
Thu Oct 20 21:16:16 2022  -P values are being converted to -log10(P)...
Thu Oct 20 21:16:16 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Thu Oct 20 21:16:16 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Thu Oct 20 21:16:16 2022  -Maximum -log10(P) values is 7.948076083953893 .
Thu Oct 20 21:16:16 2022 Finished data conversion and sanity check.
Thu Oct 20 21:16:16 2022 Start to create manhattan plot with 5831 variants:
Thu Oct 20 21:16:16 2022  -Loading gtf files from:ensembl
Thu Oct 20 21:16:24 2022  -plotting gene track..
Thu Oct 20 21:16:24 2022  -Finished plotting gene track..
Thu Oct 20 21:16:25 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Thu Oct 20 21:16:25 2022 Finished creating Manhattan plot successfully!
Thu Oct 20 21:16:25 2022  -Skip annotating
Out[10]:
(<Figure size 1500x1000 with 3 Axes>, <gwaslab.Log.Log at 0x7ff2b0428e20>)
In [4]:
Copied!
lead_variants = mysumstats.get_lead()
lead_variants
lead_variants = mysumstats.get_lead() lead_variants
Fri Oct 21 01:16:07 2022 Start to extract lead variants...
Fri Oct 21 01:16:07 2022  -Processing 707780 variants...
Fri Oct 21 01:16:07 2022  -Significance threshold : 5e-08
Fri Oct 21 01:16:07 2022  -Sliding window size: 500  kb
Fri Oct 21 01:16:07 2022  -Found 1077 significant variants in total...
Fri Oct 21 01:16:07 2022  -Identified 7 lead variants!
Fri Oct 21 01:16:07 2022 Finished extracting lead variants successfully!
Out[4]:
SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS
5474489 7:13888699_G_C 7 13888699 G C 0.5680 0.0562 0.0094 2.507000e-09 191764 ++++ 9960099
5480067 7:14898282_C_T 7 14898282 C T 0.6012 0.0617 0.0088 2.336000e-12 191764 ++++ 9960099
5626346 7:44174857_T_G 7 44174857 G T 0.5985 -0.0640 0.0093 5.325000e-12 191764 ---- 9960099
5732279 7:69406661_A_T 7 69406661 T A 0.1981 -0.0900 0.0111 4.871000e-16 191764 ---- 9960099
5965364 7:127253550_C_T 7 127253550 C T 0.9081 0.2761 0.0152 4.101000e-74 191764 ++++ 9960099
5976830 7:130025713_G_A 7 130025713 G A 0.9530 -0.1365 0.0230 3.068000e-09 191764 ---- 9960099
6092347 7:157038803_A_G 7 157038803 G A 0.4626 -0.0502 0.0088 1.127000e-08 191764 ---- 9960099
In [8]:
Copied!
flank = 100000
for index,row in lead_variants.iterrows():
        mysumstats.plot_mqq(mode="r",region=(row["CHR"],row["POS"]-flank,row["POS"]+flank),region_grid=True,mqqratio=2,taf=[4,0,0.95,1,1],figargs={"figsize":(15,5),"dpi":300})
                        #vcf_path="/home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz")
flank = 100000 for index,row in lead_variants.iterrows(): mysumstats.plot_mqq(mode="r",region=(row["CHR"],row["POS"]-flank,row["POS"]+flank),region_grid=True,mqqratio=2,taf=[4,0,0.95,1,1],figargs={"figsize":(15,5),"dpi":300}) #vcf_path="/home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz")
Fri Oct 21 01:20:36 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:20:36 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:20:36 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:20:36 2022  -Plot layout mode is : r
Fri Oct 21 01:20:36 2022  -Region to plot : chr7:13788699-13988699.
Fri Oct 21 01:20:36 2022  -Extract SNPs in region : chr7:13788699-13988699...
Fri Oct 21 01:20:37 2022  -Extract SNPs in specified regions: 1225
Fri Oct 21 01:20:37 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:20:37 2022 Start conversion and sanity check:
Fri Oct 21 01:20:37 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:20:37 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:20:37 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:20:37 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:20:37 2022  -Maximum -log10(P) values is 8.600845666041783 .
Fri Oct 21 01:20:37 2022 Finished data conversion and sanity check.
Fri Oct 21 01:20:37 2022 Start to create manhattan plot with 1224 variants:
Fri Oct 21 01:20:37 2022  -Loading gtf files from:defualt
Fri Oct 21 01:20:46 2022  -plotting gene track..
Fri Oct 21 01:20:46 2022  -Finished plotting gene track..
Fri Oct 21 01:20:46 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:20:46 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:20:46 2022  -Skip annotating
Fri Oct 21 01:20:46 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:20:46 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:20:46 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:20:46 2022  -Plot layout mode is : r
Fri Oct 21 01:20:46 2022  -Region to plot : chr7:14798282-14998282.
Fri Oct 21 01:20:46 2022  -Extract SNPs in region : chr7:14798282-14998282...
Fri Oct 21 01:20:47 2022  -Extract SNPs in specified regions: 993
Fri Oct 21 01:20:47 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:20:47 2022 Start conversion and sanity check:
Fri Oct 21 01:20:47 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:20:47 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:20:47 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:20:47 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:20:47 2022  -Maximum -log10(P) values is 11.631527161559639 .
Fri Oct 21 01:20:47 2022 Finished data conversion and sanity check.
Fri Oct 21 01:20:47 2022 Start to create manhattan plot with 993 variants:
Fri Oct 21 01:20:47 2022  -Loading gtf files from:defualt
Fri Oct 21 01:20:56 2022  -plotting gene track..
Fri Oct 21 01:20:56 2022  -Finished plotting gene track..
Fri Oct 21 01:20:56 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:20:56 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:20:56 2022  -Skip annotating
Fri Oct 21 01:20:56 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:20:56 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:20:56 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:20:56 2022  -Plot layout mode is : r
Fri Oct 21 01:20:56 2022  -Region to plot : chr7:44074857-44274857.
Fri Oct 21 01:20:56 2022  -Extract SNPs in region : chr7:44074857-44274857...
Fri Oct 21 01:20:58 2022  -Extract SNPs in specified regions: 929
Fri Oct 21 01:20:58 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:20:58 2022 Start conversion and sanity check:
Fri Oct 21 01:20:58 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:20:58 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:20:58 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:20:58 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:20:58 2022  -Maximum -log10(P) values is 11.273680387889225 .
Fri Oct 21 01:20:58 2022 Finished data conversion and sanity check.
Fri Oct 21 01:20:58 2022 Start to create manhattan plot with 929 variants:
Fri Oct 21 01:20:58 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:07 2022  -plotting gene track..
Fri Oct 21 01:21:07 2022  -Finished plotting gene track..
Fri Oct 21 01:21:07 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:07 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:07 2022  -Skip annotating
Fri Oct 21 01:21:07 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:21:07 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:21:07 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:21:07 2022  -Plot layout mode is : r
Fri Oct 21 01:21:07 2022  -Region to plot : chr7:69306661-69506661.
Fri Oct 21 01:21:07 2022  -Extract SNPs in region : chr7:69306661-69506661...
Fri Oct 21 01:21:09 2022  -Extract SNPs in specified regions: 501
Fri Oct 21 01:21:09 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:21:09 2022 Start conversion and sanity check:
Fri Oct 21 01:21:09 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:21:09 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:21:09 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:21:09 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:21:09 2022  -Maximum -log10(P) values is 15.31238187042823 .
Fri Oct 21 01:21:09 2022 Finished data conversion and sanity check.
Fri Oct 21 01:21:09 2022 Start to create manhattan plot with 501 variants:
Fri Oct 21 01:21:09 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:18 2022  -plotting gene track..
Fri Oct 21 01:21:18 2022  -Finished plotting gene track..
Fri Oct 21 01:21:18 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:18 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:18 2022  -Skip annotating
Fri Oct 21 01:21:18 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:21:18 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:21:18 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:21:18 2022  -Plot layout mode is : r
Fri Oct 21 01:21:18 2022  -Region to plot : chr7:127153550-127353550.
Fri Oct 21 01:21:18 2022  -Extract SNPs in region : chr7:127153550-127353550...
Fri Oct 21 01:21:19 2022  -Extract SNPs in specified regions: 713
Fri Oct 21 01:21:19 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:21:19 2022 Start conversion and sanity check:
Fri Oct 21 01:21:19 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:21:19 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:21:19 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:21:19 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:21:19 2022  -Maximum -log10(P) values is 73.38711023071251 .
Fri Oct 21 01:21:19 2022 Finished data conversion and sanity check.
Fri Oct 21 01:21:19 2022 Start to create manhattan plot with 713 variants:
Fri Oct 21 01:21:19 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:28 2022  -plotting gene track..
Fri Oct 21 01:21:28 2022  -Finished plotting gene track..
Fri Oct 21 01:21:28 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:28 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:28 2022  -Skip annotating
Fri Oct 21 01:21:28 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:21:28 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:21:28 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:21:28 2022  -Plot layout mode is : r
Fri Oct 21 01:21:28 2022  -Region to plot : chr7:129925713-130125713.
Fri Oct 21 01:21:28 2022  -Extract SNPs in region : chr7:129925713-130125713...
Fri Oct 21 01:21:30 2022  -Extract SNPs in specified regions: 808
Fri Oct 21 01:21:30 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:21:30 2022 Start conversion and sanity check:
Fri Oct 21 01:21:30 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:21:30 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:21:30 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:21:30 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:21:30 2022  -Maximum -log10(P) values is 8.513144644723056 .
Fri Oct 21 01:21:30 2022 Finished data conversion and sanity check.
Fri Oct 21 01:21:30 2022 Start to create manhattan plot with 808 variants:
Fri Oct 21 01:21:30 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:39 2022  -plotting gene track..
Fri Oct 21 01:21:39 2022  -Finished plotting gene track..
Fri Oct 21 01:21:39 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:39 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:39 2022  -Skip annotating
Fri Oct 21 01:21:39 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:21:39 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:21:39 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:21:39 2022  -Plot layout mode is : r
Fri Oct 21 01:21:39 2022  -Region to plot : chr7:156938803-157138803.
Fri Oct 21 01:21:39 2022  -Extract SNPs in region : chr7:156938803-157138803...
Fri Oct 21 01:21:40 2022  -Extract SNPs in specified regions: 1121
Fri Oct 21 01:21:40 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:21:40 2022 Start conversion and sanity check:
Fri Oct 21 01:21:40 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:21:40 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:21:40 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:21:40 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:21:40 2022  -Maximum -log10(P) values is 7.948076083953893 .
Fri Oct 21 01:21:40 2022 Finished data conversion and sanity check.
Fri Oct 21 01:21:40 2022 Start to create manhattan plot with 1121 variants:
Fri Oct 21 01:21:41 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:50 2022  -plotting gene track..
Fri Oct 21 01:21:50 2022  -Finished plotting gene track..
Fri Oct 21 01:21:50 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:50 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:50 2022  -Skip annotating
In [9]:
Copied!
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803),region_grid=True,
                    vcf_path="/home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz",taf=[4,0,0.95,1,1])
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803),region_grid=True, vcf_path="/home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz",taf=[4,0,0.95,1,1])
Fri Oct 21 01:24:42 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:24:42 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:24:42 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:24:42 2022  -Plot layout mode is : r
Fri Oct 21 01:24:42 2022  -Region to plot : chr7:156538803-157538803.
Fri Oct 21 01:24:42 2022  -Extract SNPs in region : chr7:156538803-157538803...
Fri Oct 21 01:24:43 2022  -Extract SNPs in specified regions: 5831
Fri Oct 21 01:24:43 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:24:43 2022 Start conversion and sanity check:
Fri Oct 21 01:24:43 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:24:43 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:24:43 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:24:43 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:24:43 2022  -Maximum -log10(P) values is 7.948076083953893 .
Fri Oct 21 01:24:43 2022 Finished data conversion and sanity check.
Fri Oct 21 01:24:43 2022 Start to load reference genotype...
Fri Oct 21 01:24:43 2022  -reference vcf path : /home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz
Fri Oct 21 01:25:26 2022  -Retrieving index...
16384
Fri Oct 21 01:25:26 2022  -Calculating Rsq...
Fri Oct 21 01:25:26 2022 Finished loading reference genotype successfully!
Fri Oct 21 01:25:26 2022 Start to create manhattan plot with 5831 variants:
Fri Oct 21 01:25:26 2022  -Loading gtf files from:defualt
Fri Oct 21 01:25:35 2022  -plotting gene track..
Fri Oct 21 01:25:35 2022  -Finished plotting gene track..
Fri Oct 21 01:25:35 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:25:35 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:25:35 2022  -Skip annotating
Out[9]:
(<Figure size 1500x1000 with 4 Axes>, <gwaslab.Log.Log at 0x7fb2af198040>)
In [ ]:
Copied!


GWASLab is licensed under the MIT license
Documentation built with MkDocs.

Keyboard Shortcuts

Keys Action
? Open this help
n Next page
p Previous page
s Search